50 research outputs found

    Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing

    Get PDF
    Abstract—Large applications executing on Grid or cluster architectures consisting of hundreds or thousands of computational nodes create problems with respect to reliability. The source of the problems are node failures and the need for dynamic configuration over extensive runtime. This paper presents two fault-tolerance mechanisms called Theft-Induced Checkpointing and Systematic Event Logging. These are transparent protocols capable of overcoming problems associated with both benign faults, i.e., crash faults, and node or subnet volatility. Specifically, the protocols base the state of the execution on a dataflow graph, allowing for efficient recovery in dynamic heterogeneous systems as well as multithreaded applications. By allowing recovery even under different numbers of processors, the approaches are especially suitable for applications with a need for adaptive or reactionary configuration control. The low-cost protocols offer the capability of controlling or bounding the overhead. A formal cost model is presented, followed by an experimental evaluation. It is shown that the overhead of the protocol is very small, and the maximum work lost by a crashed process is small and bounded. Index Terms—Grid computing, rollback recovery, checkpointing, event logging. Ç

    Modeling of checkpointing/rollback strategy towards optimal run time in parallel applications

    Get PDF
    We present a mathematical model of checkpointing/rollback strategy, in order to ensure that execution of parallel applications in High Performance Computing (HPC) platform are completed in as little time as possible, which is achieved through  minimize the computations loss due to expected failures or unnecessary overhead of fault tolerant mechanisms. In our study, we are interested in special failure of components, which is called (crash fault), that shows a constant behavior of system during the work, either failure or work at for a moment, and we study a coordinated checkpointing strategy for fault tolerance to achieve continuity of the application despite the failures.  

    Prevalence of physical inactivity and barriers to physical activity among obese attendants at a community health-care center in Karachi, Pakistan

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Overweight and obesity are significant public health problems worldwide with serious health consequences. With increasing urbanization and modernization there has been an increase in prevalence of obesity that is attributed to reduced levels of physical activity (PA). However, little is known about the prevalence of physical inactivity and factors that prohibit physical activity among Pakistani population. This cross-sectional study is aimed at estimating the prevalence of physical inactivity, and determining associated barriers in obese attendants accompanying patients coming to a Community Health Center in Karachi, Pakistan.</p> <p>Findings</p> <p>PA was assessed by using international physical activity questionnaire (IPAQ). Barriers to PA were also assessed in inactive obese attendants. A pre-tested questionnaire was used to collect data from a total of 350 obese attendants. Among 350 study participants 254 (72.6%) were found to be physically inactive (95% CI: 68.0%, 77.2%). Multivariable logistic regression analysis indicated that age greater than 33 years, BMI greater than 33 kg/m<sup>2 </sup>and family history of obesity were independently and significantly associated with physical inactivity. Moreover, there was a significant interaction between family structure and gender; females living in extended families were about twice more likely to be inactive, whereas males from extended families were six times more likely to be inactive relative to females from nuclear families. Lack of information, motivation and skills, spouse & family support, accessibility to places for physical activity, cost effective facilities and time were found to be important barriers to PA.</p> <p>Conclusions</p> <p>Considering the public health implications of physical inactivity it is essential to promote PA in context of an individual's health and environment. Findings highlight considerable barriers to PA among obese individuals that need to be addressed during counseling sessions with physicians.</p

    Common variants in CLDN2 and MORC4 genes confer disease susceptibility in patients with chronic pancreatitis

    Get PDF
    A recent Genome-wide Association Study (GWAS) identified association with variants in X-linked CLDN2 and MORC4 and PRSS1-PRSS2 loci with Chronic Pancreatitis (CP) in North American patients of European ancestry. We selected 9 variants from the reported GWAS and replicated the association with CP in Indian patients by genotyping 1807 unrelated Indians of Indo-European ethnicity, including 519 patients with CP and 1288 controls. The etiology of CP was idiopathic in 83.62% and alcoholic in 16.38% of 519 patients. Our study confirmed a significant association of 2 variants in CLDN2 gene (rs4409525—OR 1.71, P = 1.38 x 10-09; rs12008279—OR 1.56, P = 1.53 x 10-04) and 2 variants in MORC4 gene (rs12688220—OR 1.72, P = 9.20 x 10-09; rs6622126—OR 1.75, P = 4.04x10-05) in Indian patients with CP. We also found significant association at PRSS1-PRSS2 locus (OR 0.60; P = 9.92 x 10-06) and SAMD12-TNFRSF11B (OR 0.49, 95% CI [0.31–0.78], P = 0.0027). A variant in the gene MORC4 (rs12688220) showed significant interaction with alcohol (OR for homozygous and heterozygous risk allele -14.62 and 1.51 respectively, P = 0.0068) suggesting gene-environment interaction. A combined analysis of the genes CLDN2 and MORC4 based on an effective risk allele score revealed a higher percentage of individuals homozygous for the risk allele in CP cases with 5.09 fold enhanced risk in individuals with 7 or more effective risk alleles compared with individuals with 3 or less risk alleles (P = 1.88 x 10-14). Genetic variants in CLDN2 and MORC4 genes were associated with CP in Indian patients

    Programmation des systÚmes parallÚles distribués : tolérance aux pannes, résilience et adaptabilité

    No full text
    Grid and cluster architectures are gaining in popularity for scientific computing applications. The distributed computations, as well as their underlying infrastructure consisting of a large number of computers, storage and networking devices, pose challenges in overcoming the effects of node failures. This work presents a new checkpoint/recovery method for dataflow computations using work-stealing in heterogeneous environments as found in grid or cluster computing. Basing the state of the computation on a dynamic macro dataflow graph, it is shown that the mechanisms provide effective checkpointing for multithreaded applications in heterogeneous environments. Two methods are presented, i.e. Systematic Event Logging (SEL) and Theft-Induced Checkpointing TIC, which are efficient and extremely flexible under the system-state model, allowing for recovery on different platforms under different number of processors. A formal analysis of the overhead induced by both methods is presented, followed by an experimental evaluation in a large platform. It is shown that both methods have very small overhead and that trade-offs betweencheckpointing and recovery cost can be controlled.Les grilles et les grappes sont des architectures de plus en plus utilisĂ©es dans le domaine du calcul scientifique distribuĂ©. Le nombre important de constituants hĂ©tĂ©rogĂšnes (processeurs, mĂ©moire, interconnexion) dans ces architectures dynamiques font que le risque de dĂ©faillance est trĂšs important. Compte tenu de la durĂ©e considĂ©rable de l'exĂ©cution d'une application parallĂšle distribuĂ©e, ce risque de dĂ©faillance doit ĂȘtre contrĂŽlĂ© par l'utilisation de technique de tolĂ©rance aux pannes. Dans ce travail, la reprĂ©sentation de l'Ă©tat de l'exĂ©cution d'un programme parallĂšle est un graphe, dynamique, de flot de donnĂ©es construit Ă  l'exĂ©cution. Cette description du parallĂ©lisme est indĂ©pendante du nombre de ressources et donc exploitĂ©e pour rĂ©soudre les problĂšmes liĂ©s Ă  la dynamicitĂ© des plateformes considĂ©rĂ©es. La dĂ©finition de formats portables pour la reprĂ©sentation des noeuds du graphe rĂ©sout les problĂšmes d'hĂ©tĂ©rogĂ©nĂ©itĂ©. La sauvegarde du graphe de flot de donnĂ©es d'une application durant son exĂ©cution sur une plateforme, constitue des points de reprise pour cette application. Par la suite, une reprise est possible sur un autre type ou nombre de processus. Deux mĂ©thodes de sauvegarde / reprise, avec une analyse formelle de leurs complexitĂ©s, sont prĂ©sentĂ©es : SEL (Systematic Event Logging) et TIC (Theft-Induced Checkpointing). Des mesures expĂ©rimentales d'un prototype sur des applications caractĂ©ristiques montrent que le surcoĂ»t Ă  l'exĂ©cution peut ĂȘtre amorti, permettant d'envisager des exĂ©cutions tolĂ©rantes aux pannes qui passent Ă  l'Ă©chelle

    Modelling of shallow-water equations by using compact MacCormack-Type schemes with application to dam-break problem

    No full text

    Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing

    No full text
    International audienc

    Programmation des systÚmes parallÚles distribués (tolérance aux pannes, résilience et adaptabilité)

    No full text
    Les grilles et les grappes sont des architectures de plus en plus utilisĂ©es dans le domaine du calcul scientifique distribuĂ©. Le nombre important de constituants hĂ©tĂ©rogĂšnes (processeurs, mĂ©moire, interconnexion) dans ces architectures dynamiques font que le risque de dĂ©faillance est trĂšs important. Compte tenu de la durĂ©e considĂ©rable de l'exĂ©cution d'une application parallĂšle distribuĂ©e, ce risque de dĂ©faillance doit ĂȘtre contrĂŽlĂ© par l'utilisation de technique de tolĂ©rance aux pannes. Dans ce travail, la reprĂ©sentation de l'Ă©tat de l'exĂ©cution d'un programme parallĂšle est un graphe, dynamique, de flot de donnĂ©es construit Ă  l'exĂ©cution. Cette description du parallĂ©lisme est indĂ©pendante du nombre de ressources et donc exploitĂ©e pour rĂ©soudre les problĂšmes liĂ©s Ă  la dynamicitĂ© des plateformes considĂ©rĂ©es. La dĂ©finition de formats portables pour la reprĂ©sentation des noeuds du graphe rĂ©sout les problĂšmes d'hĂ©tĂ©rogĂ©nĂ©itĂ©. La sauvegarde du graphe de flot de donnĂ©es d'une application durant son exĂ©cution sur une plateforme, constitue des points de reprise pour cette application. Par la suite, une reprise est possible sur un autre type ou nombre de processus. Deux mĂ©thodes de sauvegarde / reprise, avec une analyse formelle de leurs complexitĂ©s, sont prĂ©sentĂ©es : SEL (Systematic Event Logging) et TIC (Theft-Induced Checkpointing). Des mesures expĂ©rimentales d'un prototype sur des applications caractĂ©ristiques montrent que le surcoĂ»t Ă  l'exĂ©cution peut ĂȘtre amorti, permettant d'envisager des exĂ©cutions tolĂ©rantes aux pannes qui passent Ă  l'Ă©chelle.GRENOBLE1-BU Sciences (384212103) / SudocSudocFranceF

    ModÚle de coût algorithmique intégrant des mécanismes de tolérance aux pannes et expérimentations

    No full text
    National audienceLes grilles et les clusters sont des architectures de plus en plus utilisĂ©es dans le domaine du calcul scientifique distribuĂ©. Le nombre important de constituants (processeurs, mĂ©moire, interconnexion) dans ces architectures font que le risque de dĂ©faillance est trĂšs important. Comptetenu de la durĂ©e considĂ©rable de l'exĂ©cution d'une application distribuĂ©e, ce risque de dĂ©faillance doit ĂȘtre contrĂŽlĂ© par l'utilisation de technique de tolĂ©rance aux pannes. Dans cet article, nous prĂ©sentons deux mĂ©canismes de tolĂ©rance aux pannes basĂ©s sur une sauvegarde de l'Ă©tat du futur de l'exĂ©cution reprĂ©sentĂ© par un graphe de flot de donnĂ©es. Nous prĂ©sentons leurs modĂšles de coĂ»t algorithmique intĂ©grant le temps nĂ©cessaire pour la sauvegarde de l'Ă©tat des processus. Nous montrons que pour la classe des programmes considĂ©rĂ©e et les mĂ©canismes de tolĂ©rance aux pannes, les accĂ©lĂ©rations asymptotiques sont linĂ©aires en fonction du nombre de processeurs. Un prototype existe et des expĂ©rimentations montrent que le surcoĂ»t Ă  l'exĂ©cution peut ĂȘtre amorti, permettant d'envisager des exĂ©cutions tolĂ©rantes aux pannes qui passent Ă  l'Ă©chelle. Des comparaisons expĂ©rimentales sur une grappe d'environ 200 processeurs complĂštent les analyses thĂ©oriques

    Programmation des systÚmes parallÚles distribués (tolérance aux pannes, résilience et adaptabilité)

    No full text
    Les grilles et les grappes sont des architectures de plus en plus utilisĂ©es dans le domaine du calcul scientifique distribuĂ©. Le nombre important de constituants hĂ©tĂ©rogĂšnes (processeurs, mĂ©moire, interconnexion) dans ces architectures dynamiques font que le risque de dĂ©faillance est trĂšs important. Compte tenu de la durĂ©e considĂ©rable de l'exĂ©cution d'une application parallĂšle distribuĂ©e, ce risque de dĂ©faillance doit ĂȘtre contrĂŽlĂ© par l'utilisation de technique de tolĂ©rance aux pannes. Dans ce travail, la reprĂ©sentation de l'Ă©tat de l'exĂ©cution d'un programme parallĂšle est un graphe, dynamique, de flot de donnĂ©es construit Ă  l'exĂ©cution. Cette description du parallĂ©lisme est indĂ©pendante du nombre de ressources et donc exploitĂ©e pour rĂ©soudre les problĂšmes liĂ©s Ă  la dynamicitĂ© des plateformes considĂ©rĂ©es. La dĂ©finition de formats portables pour la reprĂ©sentation des noeuds du graphe rĂ©sout les problĂšmes d'hĂ©tĂ©rogĂ©nĂ©itĂ©. La sauvegarde du graphe de flot de donnĂ©es d'une application durant son exĂ©cution sur une plateforme, constitue des points de reprise pour cette application. Par la suite, une reprise est possible sur un autre type ou nombre de processus. Deux mĂ©thodes de sauvegarde / reprise, avec une analyse formelle de leurs complexitĂ©s, sont prĂ©sentĂ©es : SEL (Systematic Event Logging) et TIC (Theft-Induced Checkpointing). Des mesures expĂ©rimentales d'un prototype sur des applications caractĂ©ristiques montrent que le surcoĂ»t Ă  l'exĂ©cution peut ĂȘtre amorti, permettant d'envisager des exĂ©cutions tolĂ©rantes aux pannes qui passent Ă  l'Ă©chelle.GRENOBLE1-BU Sciences (384212103) / SudocSudocFranceF
    corecore